Phylogeography of Nasutitermes ephratae (Termitidae: Nasutitermitinae) in neotropical region

The neotropical region ranks third in the number of termites and includes five different families. Of these, Termitidae is the most diverse and includes the species Nasutitermes ephratae, which is widespread in the neotropics. To date, only one study has been published about phylogeography in neotropical termites (N. corniger). Here, we explored the population genetic patterns of N. ephratae and also evaluated the phylogeographical processes involved in the evolutionary history of the species. We used the mitochondrial genes 16S rRNA and COII as molecular markers: these were sequenced for 128 samples of N. ephratae. We estimated the genetic diversity and divergence time as well as the demography and genetic structure. We also performed an ancestral area reconstruction and a haplotype network. The results showed high genetic variability, recent demographic expansion, and strong genetic structure. A dispersal route for the species, that occurred in both directions between South and Central America, was inferred. The results emphasize a temporary separation between the South and Central America populations that affected the origin of the current Central America populations. These populations were formed from different phylogeographic histories.

The neotropics host a large diversity of species and habitats that arose from the complex geological history occurring along environmental and climatic changes. Evolutionary phenomena such as vicariance, dispersion, and extinction shaped the geographical distribution patterns of the species found in this region 1 .
The neotropical termite fauna ranks third in the number of species. Of the five neotropical termite families, Termitidae is the most diverse and includes the subfamily Nasutitermitinae with 171 species described in the neotropical region, including 67 in the genus Nasutitermes-corresponding to almost 40% of the number of species of the subfamily 2 .
Like the other Nasutitermitinae, the soldiers of this genus are characterized by a conic-shaped frontal projection with the opening of an exocrine gland located at the top. This gland produces substances used for defense against predators 3 .
Nasutitermes ephratae (Fig. 1) was described by Holmgren 4 using alates and workers collected in Ephrata, Suriname. Banks 5 described the soldiers using specimens of N. creolina collected in Panama. Later Snyder 6 synonymized N. creolina with N. ephratae. The nests of this species are arboreal and show a light to dark brown coloration and a leathery surface as if they were enveloped. Internally, the nests are reinforced around the royal camara that harbors the queen and the king in the center of the nest (closer to the trunk or branch of the tree) 7 .
To date, only one phylogeographic study addressed neotropical termites, focusing on N. corniger 15 . The results showed high variability and strong genetic structure for the populations sampled. These were divided in haplogroups along its occurrence area associated with South American biomes. The authors also proposed a dispersal route for N. corniger, which would have left Central America towards South America, where the populations dispersed toward the eastern regions. www.nature.com/scientificreports/ Phylogeographic and biogeographic studies help to explain how the species responded to geological and climatic changes over time. This fact highlights the importance of phylogeographic studies that present a variety of taxonomic groups, including insects, that have population and evolutionary dynamics different from other taxa 16 . Thus, the association of various phylogeographic data can help us to understand the evolutionary history of neotropical biota.
The main aim of this study was to explain the phylogeographical and historical processes that gave rise to population patterns of N. ephratae in the neotropics. For this purpose, we leveraged the mitochondrial molecular markers 16S rRNA and the subunit II of the code gene to cytochrome oxidase (COII). It is important to highlight that the mitochondrial DNA (mtDNA) have been extensively used for the investigation of phylogeographic issues as well as other evolutionary questions of a large variety of species, including termites 15,[17][18][19][20][21][22][23] .
We performed demographic analyses, estimates of variability and genetic structure, analyses of divergence time, and ancestral area reconstruction. We also performed a haplotype network and proposed a dispersal route for the species.
Sampling. The samples of N. ephratae (Table 1; Fig. 2a) analyzed here have been obtained from the Isoptera collections of the Museu de Zoologia da Universidade de São Paulo (MZUSP) and the University of Florida (UF). The samples were collected in different campaigns and stored in 96% ethanol (MZUSP) and in 85% ethanol (UF) to better preservation of the DNA. Extracted DNA are stored in the Molecular Collection of the Laboratório de Biologia Evolutiva (LaBE) of the Faculdade de Ciências Agrárias e Veterinárias, UNESP (Jaboticabal, SP, Brazil). In addition to these samples, 12 nucleotide sequences of N. ephratae obtained in GenBank (public access) were included in the analyses, totaling 128 samples analyzed. Laboratory procedures. The total DNA was extracted from the head of an individual per colony following Liu and Beckenbach 25 protocol, which includes phenol-chloroform for the extraction and 100% and 70% ethanol for the DNA washing. The amplification of the 16S rRNA and COII (gene regions of mtDNA) was performed by PCR following the conditions described in Supplementary Table S1. The PCR reaction was composed by 0.6 pmol/μL of each primer (forward and reverse) [26][27][28] , PCR Master Mix (Promega) 1x, nuclease free water, and 2.4 ng/μL of target DNA. The PCR product was purified using the Wizard ® SV Gel and PCR Clean-Up System (Promega) according to the manufacturer's instructions, followed by Sanger sequencing in ABI 3730 XL DNA Analyzer (Applied Biosystems) automatic sequencer. Data analysis. Genetic diversity and neutrality tests. The nucleotide sequences were read in Chromas Lite v. 2.6.5 (Technelysium Ltd., 2005). The sequences of 16S were aligned using Mafft v. 7 29,30 and the sequences of COII were aligned using Geneious v. 7.1.9 (https:// www. genei ous. com) by ClustalW, both followed by inspection by eye. Geneious also was used to concatenate the 16S and COII sequences of the samples in which both regions could by sequenced.
To quantify the genetic diversity, the following parameters were estimated using DnaSP v. 6 31 : number of polymorphic sites (S), nucleotide diversity (π), average number of nucleotide differences (k), and haplotype diversity (Hd). In addition to these parameters, the value of θ-W per sequence was estimated using Arlequin v. 3.5.1.2 32 . This software also was used to perform the Fu's Fs 33 and Tajima's D 34 neutrality tests-these tests were  The nucleotide composition of the sequences and the pairwise genetic distances (among individuals, among dominions and within dominions) were estimated using MEGA-X 36 (the model Kimura 2-parameter 37 was used to estimate the genetic distances). The genetic distances and the neutrality tests were estimated for both the concatenated sequences (16S + COII) and the single genes (16S or COII). Only the single genes were considered to estimate the genetic diversity indexes.
Haplotype network. The haplotype network was generated with the concatenated sequences (16S + COII) using TCS v. 1.21 38 , that uses parsimony method to establish the relationship among the haplotypes, with 95% of connection limit. The haplogroups were defined based on the shape of the network, considering the distance among the haplotypes and among the tips and central haplotypes of the group. The network was colored using tcsBU 39 considering the frequency of the haplotype in each neotropical dominion sampled.
The map showing the distribution of the samples, colored according to the observed haplogroups, was performed using QGIS v. 3.6.3 40  Genetic structure and Mantel test. The Analysis of Molecular Variance (AMOVA) was performed using Arlequin v. 3.5.1.2 32 to assess the possibility of genetic structure among the sampled populations. Using the concatenated sequences, tree constructions were defined for this analysis: (i) an AMOVA was performed considering the haplogroups of the haplotype network; (ii) the second AMOVA was performed separating the samples in  www.nature.com/scientificreports/ and SOU). The F indexes, calculated by AMOVA, range from 0 to 1 and indicate high differentiation above 0.25 and moderately high differentiation between 0.15 and 0.25 42 . Even to evaluate the inference of genetic structure, we performed an analysis of DNA clustering using the package rhierBAPS 43,44 implemented in R v. 4.0.1 45 . The results were obtained for both the concatenated sequences and the single genes. The graphs showing the geographical distribution of each genetic cluster (considering the concatenated sequences results) were obtained using Microsoft Excel (2008).
The Mantel test was performed using the package vegan 46 , implemented in R v. 4.0.1, considering the Pearson's correlation as method with a number of permutation equal to 10,000. We inputted the concatenated sequences and the geographical coordinates of each sample to perform this analysis.
Analysis of divergence time and ancestral area reconstruction. We estimated the divergence time among the samples of N. ephratae relying on Bayesian inference. The tree was generated in BEAST v. 2.6.3 47 using strict clock and chain length equal to 90 million. Two partitions were included in the analysis, the first containing the 16S sequences and the second containing the COII sequences. The tree models (Fossilized Birth Death Model) 48 and the clock models for both partitions were linked, and the substitution model TrN + I + G were applied for 16S and COII 49 . The selection of the best-fit model of nucleotide substitution was done based on the results of jModelTest 50 Figure S2; Supplementary Table S2). For the MRCA (most recent common ancestor) priors we used exponential distribution. As outgroups, we included the species Mastotermes darwiniensis (Mastotermitidae), Amitermes dentatus (Termitidae: Termitinae), Atlantitermes snyderi (Termitidae: Nasutitermitinae), and Nasutitermes longinasus (Termitidae: Nasutitermitinae).
The MCMC trace files generated by Bayesian inference were viewed and analyzed using Tracer v.  58 .
The tree topology, as well as the divergence times estimated by this analysis, were used for the ancestral area reconstruction analysis, which was performed using the package BioGeoBEARS 59 in R v. 4.0.1 45 . We defined seven occurrence areas for this analysis, according to the neotropical dominions which were sampled (ANT, BOR, CHA, MES, PAC, PAR and SOU). We tested the models DEC, DIVALIKE and BAYAREALIKE with and without j (i.e., six models tested). The best-fit model was selected based on the lower values and higher weights of AIC and AICc.

Results
Genetic diversity and demographic inferences. We obtained 123 nucleotide sequences for 16S [396 base pairs (bp)] and 108 for COII (742 bp) within the samples, totaling 103 specimens with both mtDNA genes sequenced. The nucleotide composition of the concatenated sequences corresponds to 40.27% of adenine (A), 25.35% of thymine (T), 21.20% of cytosine (C), and 13.19% of guanine (G). The mean genetic distance among the samples was 0.011 for the concatenated sequences, 0.007 for the 16S sequences, and 0.015 for COII sequences (the pairwise genetic distances are presented in Supplementary Tables S3, S4, and S5). After estimating the genetic distances among and within the dominions, higher values were obtained for the BOR-MES and BOR-PAC pairwise comparisons ( Table 2). The mean genetic distance within dominions was 0.008.
We found 66 haplotypes considering the concatenated sequences: 41 haplotypes (Hd = 0.890) for 16S and 53 haplotypes (Hd = 0.932) for COII (Table 3a), indicating high genetic variability for the populations sampled. High values also were found for the number of polymorphic sites (S) of both mtDNA genes (24 for 16S and 87 for COII). The COII synonym sites showed the higher value of nucleotide diversity (π), which was 200 times higher than the value observed for the non-synonym sites (Table 3a). The 16S π value was lower than the COII π value, which tends to present more polymorphism.
Regarding to the neutrality tests performed with the whole sample set (Table 3b), the Tajima's D value was not significant. This means that there is not enough evidence to discriminate between demographic expansion or population bottlenecks from these indices. The negative and significant value of Fu's Fs, i.e., the most sensitive among the neutrality tests performed 60 , suggest demographic expansion for the populations, which is also suggested from the negative value of Achaz Y*. Regarding the neutrality tests performed per dominion (Table 3c), the negative and significant values of Fu's Fs also indicate demographic expansion for all the dominions Only the value of Tajima's D for CHA was significantly negative. The results of neutrality tests performed for each gene are presented in Supplementary Table S6. Haplotype network. We generated a haplotype network to reconstruct the relations among the 66 haplotypes found for N. ephratae (Fig. 2b). We observed four haplogroups (Hg) composed, in general, by haplotypes close each other and originated from a more frequent central haplotype. Haplotypes that were very distant from the central haplotype of the haplogroup (i.e., that present many mutational steps) were not grouped. The star shape of the haplogroups suggests recent demographic expansion events for the populations analyzed.  Table S7). The haplogroup 3 is formed mainly by populations located in northern South America (BOR) and in the Antillean islands. Haplogroup 3 also includes haplotypes from eastern Brazil (PAR -Espírito Santo, Brazil), which are intermediaries between this haplogroup and the haplogroup 1. The geographical distribution of the haplogroups is also showed in Fig. 2a.    (Table 4) and rhierBAPS (Fig. 3). Regarding the AMOVA among haplogroups (Table 4a), the F ST value was significantly very high indicating strong differentiation among the haplogroups established in the network (Fig. 2b). A high F ST was also observed in the AMOVA among dominions (Table 4b), showing strong genetic differentiation among the dominions sampled. In the AMOVA among continents (Table 4c), the high F CT also indicates strong differentiation among the populations located in different continents while a moderate differentiation can be observed among the dominions within continents, as showed by F SC . The rhierBAPS results (Fig. 3) showed five genetic clusters (represented by different shades of gray in Fig. 3a) for the N. ephratae samples. The samples of the cluster 1 are distributed in South America; the samples of cluster 2 are distributed in northern South America (Venezuela and Colombia in southern portion of PAC) and Central America (MES dominion). Cluster 3 includes only samples from Central America (MES and northern portion of PAC), and cluster 4 includes samples from BOR, ANT, and mainly from PAC (northern and southern portions) and MES. Cluster 5 includes samples from BOR and southern portion of PAC (Fig. 3b). These results also suggest strong genetic differentiation among the populations located in South and Central America.
Mantel's test was performed to verify the existence of isolation by distance and resulted in a significantly positive r value (0.1197; p = 0.0005), thus indicating that there is a positive correlation between the genetic distance and the geographic distance, i.e., the genetic distances can increase as geographic distance between populations increase. Divergence time and ancestral area reconstruction. The divergence times of the N. ephratae populations were estimated from a Bayesian inference (BI) analysis performed using the haplotypes of the 16S and COII sequences ( Fig. 4; Supplementary Fig. S2). According to the tree obtained, three important cladogenetic events seem to have given rise to the N. ephratae populations analyzed. The first event (6.42 My) separated the Hg 4 + H25-H59 clade (that includes samples from Central America and northern South America) from the other clades -the ancestral node of this clade was dated to 2.19 My by the analysis and the Hg 4 clade was dated For the ancestral area reconstruction, the best-fit model selected, DEC + j, presented AIC and AICc equal to 211.7 and 212.4 with weights equal to 0.85 and 0.84, respectively. The results showed that the common ancestor of all the clades was distributed in MES and PAC dominions and were dispersed posteriorly to the other areas ( Fig. 4; Supplementary Fig. S3).

Discussion
Data on the genetic diversity obtained for the N. ephratae populations showed high genetic variability for the species, as evidenced by the large number of haplotypes in relation to the total number of samples as well as by the high haplotype diversity. As expected, the nucleotide diversity observed for the COII sequences was greater . Bayesian inference tree generated using the haplotypes of N. ephratae, associated to the results of the ancestral area reconstruction. The numbers near to the nodes correspond to the estimated divergence times in million of years (My). The "N" inside the PAC circles (close to the taxa names) are indicating that the respective haplotype was observed in the northern portion of PAC dominion, located in Central America; PAC circles without the "N" are indicating the haplotypes from southern portion of PAC dominion, located in South America. The posterior probabilities and the 95% HPD intervals for this BI are showed in Supplementary  Fig. S2. The map was generated by AFS using the software QGis v. 3.6.3 (http:// qgis. org) 40  www.nature.com/scientificreports/ than the nucleotide diversity found for the 16S. Furthermore, the π value obtained for the COII synonyms sites was higher than the π value for the non-synonym sites (Table 3a). The high genetic variability observed for N. ephratae seems not to be widely shared among the populations, as highlighted by the AMOVA results (Table 4), which showed strong genetic structure for the species. That is, there are a lot of genetic variants that are exclusive of one or few regions suggesting low or moderated gene flow among distant populations. This limitation in gene flow may have occurred due to isolation by distance as confirmed by Mantel's test. The clustering analysis results also corroborated the population structure inference because five clusters with a distribution limited to adjacent dominions were recovered. Cluster 1 shows a larger geographic distribution and occurs in all South American dominions. Great genetic variability with haplotypes little shared among distant geographic regions was also observed very similar to N. corniger 15 .
Regarding to the demographic history of the N. ephratae populations, the neutrality tests (Table 3b,c), specifically Fu's Fs, indicated demographic expansion for the species (considering the whole sample set) and particularly for each neotropical dominion sampled. This also can be inferred from the star shape of the haplogroups in the haplotype network, thus suggesting that a lot of descendent haplotypes had recently risen from the ancestor haplotypes located in the center of the haplogroups.
The haplotype network (Fig. 2b) also showed four haplogroups presenting clear differences about the geographical distribution of the haplotypes. Haplogroup 1 is entirely composed of haplotypes from the South America dominions while haplogroups 2 and 4 are mostly formed by Central American haplotypes. Some haplotypes included in haplogroup 4 can also be found in northern South America (Venezuela and Trinidad and Tobago) but were not often observed at latitudes below this. Haplogroup 3 is composed of haplotypes from northern South America (Trinidad and Tobago and French Guiana) and from the Antillean islands (Dominica and Guadeloupe).
These groups were also recovered by the clustering analysis (Fig. 3a) except for a few differences among them. In general, the results of the analyses are linked as following: haplogroup 1 corresponds to cluster 1; haplogroup 2 corresponds to cluster 3; haplogroup 3 corresponds to cluster 5; and haplogroup 4 corresponds to cluster 4.
The BI analysis ( Fig. 4; Supplementary Fig. S2) shows that some nodes of the tree presented posterior probabilities below 0.50 possibly due to the difficulties of the algorithm, implemented in BEAST, in solving datasets containing very similar sequences, which occurs in many intraspecific analyses 61 . Despite this, large clades of this tree include haplotypes from the same haplogroup (Fig. 4), helping to support the phylogenetic inferences raised by BI. Observing the topology of the tree, the haplotype network, and the ancestral ranges reconstructed, we inferred dispersal events and then proposed a dispersal route for the N. ephratae populations (Fig. 5).
The ancestral populations distributed in MES and PAC ("1" ; Fig. 5a) suffered a temporary separation that split the populations of South America (southern PAC) from the Central American (northern PAC + MES) populations ("2" ; Fig. 5b). The ancestor of the haplogroups 1, 2, and 3 occurred in MES during this separation ("3" ;  Fig. 5c). This leads to the origin of haplogroup 2 that was restricted to Central America (northern PAC + MES). After the reconnection of the N. ephratae populations (indicated with an asterisk in Fig. 5), the Mesoamerican ancestor dispersed to the South American portion of PAC ("5"; Fig. 5e) from where there was a new dispersion to SOU ("6", Fig. 5f) and to ANT-BOR ("7", Fig. 5f). This last dispersion gave rise to haplogroup 3 composed mainly of French Guiana and Antillean populations. The ancestral populations of the haplogroup 1 that had arisen in SOU were widely dispersed to CHA and PAR (Atlantic Forest lato sensu) reaching to BOR and southern PAC ("8", Fig. 5f), but remaining limited to South America.
Still during the temporary separation between the South and Central America N. ephratae populations, the South American ancestor of the haplogroup 4 arose in southern PAC ("4"; Fig. 5d). After the populations' reconnection, there was a dispersal from southern PAC to northern PAC and to MES originating as the late ancestor of haplogroup 4 and clade H25-A59, which were restricted to these dominions ("5" ; Fig. 5e).
Based on the dispersal route proposed, we inferred that the N. ephratae populations currently distributed in Central America arose from distinct dispersal events. This becomes clearer when we also observe the geographic distribution of haplogroups 2 and 4 ( Fig. 2) whose occurrence areas overlap, but who have different genetic groups. That is, even though these haplogroups are distributed in the same area, they are results of different evolutionary events. The ancestors of these haplogroups possibly diverged during a temporary split among the ancestral N. ephratae populations from South and Central American between the late Pliocene and early Pleistocene (around 5.06 and 2.78 My; Figs. 4 and 5). Haplogroup 2 originated from populations that were restricted to Central America during this split while the haplogroup 4 originated after the reconnection perhaps arising from populations that were dispersed from the northern South America (southern PAC) to Central America (northern PAC + MES). The same direction of dispersion (South America to Central America) was also identified for the ant species Neoponera villosa 16 dated from 0.46 to 0.28 My; earlier dispersions have been observed for N. ephratae.
Haplogroup 1 is exclusively South American and arose from a Mesoamerican ancestor that dispersed to the northern South America and then to the SOU dominion. Therefore, the dispersal events of the N. ephratae populations among Central and South America occurred in both senses, thus shaping the genetic and phylogeographic patterns observed here.
Although the causes responsible for the temporary split between the South and Central American N. ephratae population are not clear, some hypotheses can be proposed. A population isolation caused by the geographic distance between the populations may have led to this split: The occurrence of isolation by distance was suggested by the Mantel's test. The loss of distribution area could also have caused this effect in the populations. In this way, the reconnection of the populations could have occurred due to the demographic expansion that was detected by the neutrality tests and can be suggested from the star shape of the haplogroups in the haplotype network.
Geological and/or geographic factors could also help to explain this split. Following this approach, it is possible that the separation is related to the tertiary's tectonic and paleographical reorganization movements (in the late Pliocene), which led to the emergence of barriers and changes in dispersal routes in South America 62 ). Moreover, the early Quaternary climate changes (early Pleistocene) was characterized by temperature and dryness oscillations in the continents and was also impacted the adaptability and the migration of species and populations 62 . Although dispersion by water (inside flotsam carried by ocean or river currents or ferried by vessels) helps to explain dispersion patterns for termite species 55,64,65 , it is more likely that the paths taken by the populations of N. ephratae were overland, which makes the PAC dominion an obligatory passage for the dispersal of populations between South and Central America. This dominion harbor peculiarly haplotypes from all haplogroups and present a higher value of intra-dominion genetic distance (Table 2b). These features help to infer the PAC intermediary position for the N. ephratae dispersions.
The dispersal route traced for the populations analyzed here are very similar to the dispersal route observed for N. corniger species 15 including the dispersion from Central to South America, the eastward dispersion on South America lands, and the late occupation of Atlantic Forest. Nevertheless, the dispersal from South to Central America and the temporary split among the population have not yet been detected for N. corniger unlike N. ephratae. However, it is important to note that only the 16S mitochondrial marker was used for the N. corniger analyses 15 . The addition of other markers can lead to a more robust comparison among the population patterns of the two species.
Crews and Esposito 66 also studied dispersal routes and identified that South America is most probably the origin of most of the Caribbean arthropod fauna including N. ephratae, N. corniger, and other species of the genus Nasutitermes. These data contradict the inferences raised by Santos et al. 15 although there are some important methodological differences among the studies. Crews and Esposito 66 analyzed 18 species of Nasutitermes using the same mtDNA genes used here as molecular markers (16S + COII). The dispersal route proposed herein is according to the inferences made by Crews and Esposito 66 because the N. ephratae Antillean populations arose from a South America ancestor according to the ancestral area reconstruction. Specifically, for these island populations, it is possible that the dispersion from northern South America to the Antilles occurred via flotsam or floating wood carried to the islands by oceanic currents. This kind of dispersion has already been detected for Caribbean termite species 65 . Most of the islands' termite species analyzed here are compose of haplogroup 3 www.nature.com/scientificreports/ together with samples from French Guiana (BOR dominion) and Trinidad and Tobago (located in the transition between BOR and PAC). This data suggests high genetic similarity among populations of these areas-although a larger sampling can better clarify the relations among the two dominions.
Regarding the BOR dominion, we found that the samples from this area have haplotypes typically found in other areas, which stayed distant from each other in the haplotype network (except for the haplotypes from French Guiana). This distance was also observed in BI. Some authors argue that the Amazonian biota (composed of parts of BOR and SOU dominions and the southeastern Amazon dominion-the latter was not sampled herein) have unnatural biogeographic origins [67][68][69][70][71] . As a speculation, this hypothesis could help clarify the existence of haplotypes that are genetically distant from each other in Amazonian localities that are geographically close to each other. However, we point that it is necessary to include a greater sampling of this region to clearly detect the genetic population patterns and the evolutionary events involving N. ephratae in this dominion.
In general, there are many questions to be explored about the phylogeographic processes of the South American and neotropical species especially for termites whose studies are still new. Better sampling of N. ephratae and/or including new molecular markers in future studies, as well as addressing phylogeographic issues of other species, can help solidify the inferences made in this work and can expand the understanding of the evolutionary history of the group and of the neotropics.

Conclusions
This study considered most area where N. ephratae occurs in the neotropics. It was possible to make important inferences about the general panorama of the evolutionary history of the species in this region although a broader sampling, especially from central-eastern South America, could better clarify some phylogeographic patterns. Our data also showed similarities on the population and dispersal patterns among N. ephratae and N. corniger. Here, it is possible to speculate that both species responded similarly to the biogeographic processes that have occurred in the neotropics although new comparative studies may better answer this question. In summary, this study offers important contributions to the understanding of biogeographic and phylogeographic issues in the neotropics especially evolutionary studies of termites and other insects.